query strategy
- North America > United States (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Jordan (0.04)
- Workflow (0.67)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- Energy (1.00)
- Education (0.68)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)
- (2 more...)
Active Slice Discovery in Large Language Models
Zhang, Minhui, Ijner, Prahar, Wald, Yoav, Creager, Elliot
Large Language Models (LLMs) often exhibit systematic errors on specific subsets of data, known as error slices. For instance, a slice can correspond to a certain demographic, where a model does poorly in identifying toxic comments regarding that demographic. Identifying error slices is crucial to understanding and improving models, but it is also challenging. An appealing approach to reduce the amount of manual annotation required is to actively group errors that are likely to belong to the same slice, while using limited access to an annotator to verify whether the chosen samples share the same pattern of model mistake. In this paper, we formalize this approach as Active Slice Discovery and explore it empirically on a problem of discovering human-defined slices in toxicity classification. We examine the efficacy of active slice discovery under different choices of feature representations and active learning algorithms. On several slices, we find that uncertainty-based active learning algorithms are most effective, achieving competitive accuracy using 2-10% of the available slice membership information, while significantly outperforming baselines.
TActiLE: Tiny Active LEarning for wearable devices
Pavan, Massimo, Galimberti, Claudio, Roveri, Manuel
Tiny Machine Learning (TinyML) algorithms have seen extensive use in recent years, enabling wearable devices to be not only connected but also genuinely intelligent by running machine learning (ML) computations directly on-device. Among such devices, smart glasses have particularly benefited from TinyML advancements. TinyML facilitates the on-device execution of the inference phase of ML algorithms on embedded and wearable devices, and more recently, it has expanded into On-device Learning (ODL), which allows both inference and learning phases to occur directly on the device. The application of ODL techniques to wearable devices is particularly compelling, as it enables the development of more personalized models that adapt based on the data of the user. However, one of the major challenges of ODL algorithms is the scarcity of labeled data collected on-device. In smart wearable contexts, requiring users to manually label large amounts of data is often impractical and could lead to user disengagement with the technology. To address this issue, this paper explores the application of Active Learning (AL) techniques, i.e., techniques that aim at minimizing the labeling effort, by actively selecting from a large quantity of unlabeled data only a small subset to be labeled and added to the training set of the algorithm. In particular, we propose TActiLE, a novel AL algorithm that selects from the stream of on-device sensor data the ones that would help the ML algorithm improve the most once coupled with labels provided by the user. TActiLE is the first Active Learning technique specifically designed for the TinyML context. We evaluate its effectiveness and efficiency through experiments on multiple image classification datasets. The results demonstrate its suitability for tiny and wearable devices.
DQS: A Low-Budget Query Strategy for Enhancing Unsupervised Data-driven Anomaly Detection Approaches
Correia, Lucas, Goos, Jan-Christoph, Bäck, Thomas, Kononova, Anna V.
Truly unsupervised approaches for time series anomaly detection are rare in the literature. Those that exist suffer from a poorly set threshold, which hampers detection performance, while others, despite claiming to be unsupervised, need to be calibrated using a labelled data subset, which is often not available in the real world. This work integrates active learning with an existing unsupervised anomaly detection method by selectively querying the labels of multivariate time series, which are then used to refine the threshold selection process. To achieve this, we introduce a novel query strategy called the dissimilarity-based query strategy (DQS). DQS aims to maximise the diversity of queried samples by evaluating the similarity between anomaly scores using dynamic time warping. We assess the detection performance of DQS in comparison to other query strategies and explore the impact of mislabelling, a topic that is underexplored in the literature. Our findings indicate that DQS performs best in small-budget scenarios, though the others appear to be more robust when faced with mislabelling. Therefore, in the real world, the choice of query strategy depends on the expertise of the oracle and the number of samples they are willing to label. Regardless, all query strategies outperform the unsupervised threshold even in the presence of mislabelling. Thus, whenever it is feasible to query an oracle, employing an active learning-based threshold is recommended.
- Europe > Netherlands > South Holland > Leiden (0.05)
- Europe > Switzerland (0.04)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Declarative Techniques for NL Queries over Heterogeneous Data
Khabiri, Elham, Kephart, Jeffrey O., Heath, Fenno F. III, Jayaraman, Srideepika, Tipu, Fateh A., Li, Yingjie, Shah, Dhruv, Fokoue, Achille, Bhamidipaty, Anu
In many industrial settings, users wish to ask questions in natural language, the answers to which require assembling information from diverse structured data sources. With the advent of Large Language Models (LLMs), applications can now translate natural language questions into a set of API calls or database calls, execute them, and combine the results into an appropriate natural language response. However, these applications remain impractical in realistic industrial settings because they do not cope with the data source heterogeneity that typifies such environments. In this work, we simulate the heterogeneity of real industry settings by introducing two extensions of the popular Spider benchmark dataset that require a combination of database and API calls. Then, we introduce a declarative approach to handling such data heterogeneity and demonstrate that it copes with data source heterogeneity significantly better than state-of-the-art LLM-based agentic or imperative code generation systems. Our augmented benchmarks are available to the research community.
- Europe > Russia (0.05)
- Asia > Russia (0.05)
- North America > United States > Louisiana > Caddo Parish > Shreveport (0.04)
- (5 more...)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- Energy (1.00)
- Education (0.68)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)
- (2 more...)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Jordan (0.04)
- Workflow (0.67)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)
Cold-Start Active Correlation Clustering
Aronsson, Linus, Wu, Han, Chehreghani, Morteza Haghir
We study active correlation clustering where pairwise similarities are not provided upfront and must be queried in a cost-efficient manner through active learning. Specifically, we focus on the cold-start scenario, where no true initial pairwise similarities are available for active learning. To address this challenge, we propose a coverage-aware method that encourages diversity early in the process. We demonstrate the effectiveness of our approach through several synthetic and real-world experiments.
No Free Lunch in Active Learning: LLM Embedding Quality Dictates Query Strategy Success
Rauch, Lukas, Wirth, Moritz, Huseljic, Denis, Herde, Marek, Sick, Bernhard, Aßenmacher, Matthias
The advent of large language models (LLMs) capable of producing general-purpose representations lets us revisit the practicality of deep active learning (AL): By leveraging frozen LLM embeddings, we can mitigate the computational costs of iteratively fine-tuning large backbones. This study establishes a benchmark and systematically investigates the influence of LLM embedding quality on query strategies in deep AL. We employ five top-performing models from the massive text embedding benchmark (MTEB) leaderboard and two baselines for ten diverse text classification tasks. Our findings reveal key insights: First, initializing the labeled pool using diversity-based sampling synergizes with high-quality embeddings, boosting performance in early AL iterations. Second, the choice of the optimal query strategy is sensitive to embedding quality. While the computationally inexpensive Margin sampling can achieve performance spikes on specific datasets, we find that strategies like Badge exhibit greater robustness across tasks. Importantly, their effectiveness is often enhanced when paired with higher-quality embeddings. Our results emphasize the need for context-specific evaluation of AL strategies, as performance heavily depends on embedding quality and the target task.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)